Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut

نویسندگان

  • Nathan Schneider
  • Emily Danchik
  • Chris Dyer
  • Noah A. Smith
چکیده

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for featurerich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

We explore the consequences of representing token segmentations as hierarchical structures (trees) for the task of Multiword Expression (MWE) recognition, in isolation or in combination with dependency parsing. We propose a novel representation of token segmentation as trees on tokens, resembling dependency trees. Given this new representation, we present and evaluate two different architecture...

متن کامل

The Role of Self-Regulatory Approach in Iranian Learners' Lexical Segmentation: The case of authentic materials

The present research investigated the effect of self-regulatory approach (with two components of self-checking and self-efficacy) on pre-intermediate Iranian learners' lexical segmentation in listening comprehension via authentic listening comprehension texts. To achieve this purpose, the investigators administered an Oxford Placement Test (2007) to ninety-eight students of two girls’ private j...

متن کامل

The Role of Self-Regulatory Approach in Iranian Learners' Lexical Segmentation: The case of authentic materials

The present research investigated the effect of self-regulatory approach (with two components of self-checking and self-efficacy) on pre-intermediate Iranian learners' lexical segmentation in listening comprehension via authentic listening comprehension texts. To achieve this purpose, the investigators administered an Oxford Placement Test (2007) to ninety-eight students of two girls’ private j...

متن کامل

A Large Semantic Lexicon for Corpus Annotation

Semantic lexical resources play an important part in both corpus linguistics and NLP. Over the past 14 years, a large semantic lexical resource has been built at Lancaster University. Different from other major semantic lexicons in existence, such as WordNet, EuroWordNet and HowNet, etc., in which lexemes are clustered and linked via the relationship between word/MWE senses or definitions of me...

متن کامل

Towards Best Practice for Multiword Expressions in Computational Lexicons

The importance and role of multi-word expressions (MWE) in the description and processing of natural language has been long recognized. However, multi-word information has often been relegated to the marginal role of idiosyncratic lexical information. The need for MWE lexicons grows even more acute for multi-lingual applications, for which (sometimes complex) correspondences must be identified,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • TACL

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2014